The Optimized Segment Support Map for the Mining of Frequent Patterns
نویسندگان
چکیده
Computing the frequency of a pattern is a key operation in data mining algorithms. We describe a simple, yet powerful, way of speeding up any form of frequency counting satisfying the monotonicity condition. Our method, the optimized segment support map (OSSM), is based on a simple observation about data: Real life data sets are not necessarily be uniformly distributed. The OSSM is a light-weight structure that partitions the collection of transactions into segments, so as to reduce the number of candidate patterns that require frequency counting. We study the following problems: (i) What is the optimal value of , the number of segments to be used (the segment minimization problem)? (ii) Given a user-determined , what is the best segmentation/composition of the segments (the constrained segmentation problem)? For the segment minimization problem, we provide a thorough analysis and a theorem establishing the minimum value of for which there is no accuracy lost in using the OSSM. For the constrained segmentation problem, we develop various algorithms and heuristics to help facilitate segmentation. Our experimental results on both real and synthetic data sets show that our segmentation algorithms and heuristics can efficiently generate OSSMs that are compact and effective.
منابع مشابه
Data sanitization in association rule mining based on impact factor
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...
متن کاملیافتن الگوهای مکرّر در قرآن کریم بهکمک روشهای متنکاوی
Quran’s Text differs from any other texts in terms of its exceptional concepts, ideas and subjects. To recognize the valuable implicit patterns through a vast amount of data has lately captured the attention of so many researchers. Text Mining provides the grounds to extract information from texts and it can help us reach our objective in this regard. In recent years, Text Mining on Quran and e...
متن کاملHigh Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences
Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...
متن کاملMining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows
Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...
متن کاملMINING FUZZY TEMPORAL ITEMSETS WITHIN VARIOUS TIME INTERVALS IN QUANTITATIVE DATASETS
This research aims at proposing a new method for discovering frequent temporal itemsets in continuous subsets of a dataset with quantitative transactions. It is important to note that although these temporal itemsets may have relatively high textit{support} or occurrence within particular time intervals, they do not necessarily get similar textit{support} across the whole dataset, which makes i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001